Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95% performance in average with only 16% parameters of 15 task-finetuned models, showcasing the performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys
translated by 谷歌翻译
Deep Neural Networks (DNNs) have been ubiquitously adopted in internet of things and are becoming an integral of our daily life. When tackling the evolving learning tasks in real world, such as classifying different types of objects, DNNs face the challenge to continually retrain themselves according to the tasks on different edge devices. Federated continual learning is a promising technique that offers partial solutions but yet to overcome the following difficulties: the significant accuracy loss due to the limited on-device processing, the negative knowledge transfer caused by the limited communication of non-IID data, and the limited scalability on the tasks and edge devices. In this paper, we propose FedKNOW, an accurate and scalable federated continual learning framework, via a novel concept of signature task knowledge. FedKNOW is a client side solution that continuously extracts and integrates the knowledge of signature tasks which are highly influenced by the current task. Each client of FedKNOW is composed of a knowledge extractor, a gradient restorer and, most importantly, a gradient integrator. Upon training for a new task, the gradient integrator ensures the prevention of catastrophic forgetting and mitigation of negative knowledge transfer by effectively combining signature tasks identified from the past local tasks and other clients' current tasks through the global model. We implement FedKNOW in PyTorch and extensively evaluate it against state-of-the-art techniques using popular federated continual learning benchmarks. Extensive evaluation results on heterogeneous edge devices show that FedKNOW improves model accuracy by 63.24% without increasing model training time, reduces communication cost by 34.28%, and achieves more improvements under difficult scenarios such as large numbers of tasks or clients, and training different complex networks.
translated by 谷歌翻译
Vision Transformers (ViTs) have achieved overwhelming success, yet they suffer from vulnerable resolution scalability, i.e., the performance drops drastically when presented with input resolutions that are unseen during training. We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions. In particular, ResFormer operates on replicated images of different resolutions and enforces a scale consistency loss to engage interactive information across different scales. More importantly, to alternate among varying resolutions, we propose a global-local positional embedding strategy that changes smoothly conditioned on input sizes. This allows ResFormer to cope with novel resolutions effectively. We conduct extensive experiments for image classification on ImageNet. The results provide strong quantitative evidence that ResFormer has promising scaling abilities towards a wide range resolutions. For instance, ResFormer-B-MR achieves a Top-1 accuracy of 75.86% and 81.72% when evaluated on relatively low and high resolutions respectively (i.e., 96 and 640), which are 48% and 7.49% better than DeiT-B. We also demonstrate, among other things, ResFormer is flexible and can be easily extended to semantic segmentation and video action recognition.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Twitter上的自动抑郁症检测可以帮助个人在早期阶段私下方便地了解其心理健康状况,然后再见心理健康专业人员。大多数现有的黑盒样深度学习方法用于抑郁症检测主要集中在改善分类性能上。但是,在健康研究中解释模型决策至关重要,因为决策通常可以是高风险和死亡。可靠的自动诊断精神健康问题在内的抑郁症应得到可靠的解释,以证明模型的预测是合理的。在这项工作中,我们提出了一个新颖的可解释模型,用于在Twitter上检测抑郁症。它包括一个新颖的编码器,结合了分层注意机制和前馈神经网络。为了支持心理语言学研究,我们的模型利用隐喻概念映射作为输入。因此,它不仅检测到沮丧的人,还可以确定此类用户推文和相关隐喻概念映射的功能。
translated by 谷歌翻译
在这项工作中,我们研究了基于价值的深钢筋学习(DRL)中简单但普遍适用的奖励成型案例。我们表明,线性转换形式的奖励转移等同于更改函数近似中$ q $ function的初始化。基于这样的等价性,我们带来了关键的见解,即积极的奖励转移会导致保守的剥削,而负面的奖励转移会导致好奇心驱动的探索。因此,保守的剥削改善了离线RL价值估计,乐观的价值估计改善了在线RL的勘探。我们验证了对一系列RL任务的见解,并显示了其对基准的改进:(1)在离线RL中,保守的剥削可根据现成的算法提高性能; (2)在在线连续控制中,具有不同转移常数的多个值函数可用于应对探索 - 诠释困境,以提高样品效率; (3)在离散控制任务中,负奖励转移可以改善基于好奇心的探索方法。
translated by 谷歌翻译
如今,基础模型已成为人工智能中的基本基础设施之一,铺平了通往通用情报的方式。但是,现实提出了两个紧急挑战:现有的基础模型由英语社区主导;用户通常会获得有限的资源,因此不能总是使用基础模型。为了支持中文社区的发展,我们介绍了一个名为Fengshenbang的开源项目,该项目由认知计算与自然语言研究中心(CCNL)领导。我们的项目具有全面的功能,包括大型预培训模型,用户友好的API,基准,数据集等。我们将所有这些都包装在三个子项目中:风水次模型,风水框架和狂热基准。 Fengshenbang的开源路线图旨在重新评估中国预培训的大型大型模型的开源社区,促使整个中国大型模型社区的发展。我们还希望构建一个以用户为中心的开源生态系统,以允许个人访问所需的模型以匹配其计算资源。此外,我们邀请公司,大学和研究机构与我们合作建立大型开源模型的生态系统。我们希望这个项目将成为中国认知情报的基础。
translated by 谷歌翻译
我们介绍了一项对自然语言(NL)推理的人类通知,开放域和逻辑上复杂且多样的数据集,配备了一阶逻辑(fol)注释。对开本由1,435个示例(独特的结论)组成,每个示例与487组前提之一搭配,这些场所作为规则,可用于演绎理由,以理解每个结论的有效性。前提和结论的逻辑正确性是通过其平行注释来确保的,这些注释会自动由我们的FOL推理引擎验证。除了主要的NL推理任务外,对开本中的NL-FOL对自动构成了使用FOL作为逻辑形式的新的NL-FOL翻译数据集。我们对广泛的实验系统地评估了对中型语言模型(BERT,ROBERTA)进行微调的FOL推理能力,并且在大型语言模型(GPT-NEOX,OPT,OPT,GPT-3,Codex)上促成了很少的射击。对于NL-FOL翻译,我们尝试使用GPT-3和Codex。我们的结果表明,公开可用的最强大的大语言模型之一(LLM),GPT-3 Davinci,仅比随机结果略好,而在一部分集的一部分中,该模型尤其不好,并且在预测该模型方面尤其不好。纠正虚假和未知结论的真实价值。我们的数据集和代码可在https://github.com/yale-lily/folio上找到。
translated by 谷歌翻译
鉴于探索性数据分析的日益普及(EDA),了解EDA获得的知识的基本原因至关重要,但仍未进行研究。这项研究首次促进了对数据分析的透明且可解释的观点,称为可解释的数据分析(XDA)。 XDA提供了有关因果和非因果语义的定性和定量解释的数据分析。这样,XDA将显着提高人类对数据分析结果的理解和信心,从而促进现实世界中准确的数据解释和决策。为此,我们提出Xinsight,这是XDA的一般框架。 Xinsight是一种旨在提取因果图,将因果原语转化为XDA语义的三模块,端到端管道,并量化每个解释对数据事实的定量贡献。 Xinsight使用一组设计概念和优化来解决与将因果集成到XDA中相关的固有困难。关于合成和现实世界数据集以及人类评估的实验证明了Xinsight的高度有希望的能力。
translated by 谷歌翻译
没有标签的预处理分子表示模型是各种应用的基础。常规方法主要是处理2D分子图,并仅专注于2D任务,使其预验证的模型无法表征3D几何形状,因此对于下游3D任务有缺陷。在这项工作中,我们从完整而新颖的意义上处理了3D分子预处理。特别是,我们首先提议采用基于能量的模型作为预处理的骨干,该模型具有实现3D空间对称性的优点。然后,我们为力预测开发了节点级预处理损失,在此过程中,我们进一步利用了Riemann-Gaussian分布,以确保损失为E(3) - 不变,从而实现了更多的稳健性。此外,还利用了图形噪声量表预测任务,以进一步促进最终的性能。我们评估了从两个具有挑战性的3D基准:MD17和QM9的大规模3D数据集GEOM-QM9预测的模型。实验结果支持我们方法对当前最新预处理方法的更好疗效,并验证我们设计的有效性。
translated by 谷歌翻译